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ABSTRACT 

We present a new method for quasar target selection using photometric fluxes and a Bayesian 
probabihstic approach. For our purposes we target quasars using Sloan Digital Sky Survey (SDSS) 
photometry to a magnitude limit of g = 22. The efficiency and completeness of this technique 
is measured using the Baryon Oscillation Spectroscopic Survey (BOSS) data, taken in 2010. This 
technique was used for the uniformly selected (CORE) sample of targets in BOSS year one spectroscopy 
to be realized in the 9th SDSS data release. When targeting at a density of 40 objects per sq-deg (the 
BOSS quasar targeting density) the efficiency of this technique in recovering z > 2.2 quasars is 40%. 
The completeness compared to all quasars identifled in BOSS data is 65%. This paper also describes 
possible extensions and improvements for this technique. 

Subject headings: cosmology: observations, large-scale structure of universe, quasars: general, surveys, 
galaxies: distances and redshift, methods: statistical, stars: general, statistics 



1. INTRODUCTION 

The SDS S-III: Baryon Oscillatio n Spectroscopic Sur- 
vey (BOSS; Eisenstei n et al.|[2011 ) is speciflcally target- 
ing z > 2.2 QSOs in order to observe 150,000 Lyman-a 
forest (LyaF) lines of sight. The key aim of BOSS is to 
measure the absolute cosmic distance scale and expan- 
sion rate with percent-level precision at three distinct 
cosmological epochs: redshifts z = 0.3,0.6 using lumi- 
nous red galaxies (LRGs) and z ^ 2.5 using the LyaF as 
the density tracer. For both the galaxy and LyaF sam- 
ples, the primary distance me asure is the baryon acoustic 



oscillation (BAO) technique (Schlegel et al. 2007, 2009 



I 



Slosar et al. 2011). BOSS dedicates 40 flbers deg^to 



SO target selection for measuring the BAO signal. 
Previous quasar surveys, such as the Sloan Digital Sky 
Survey (SDSS; S chneider et al., 2010} and the Ango- 
Australian Telesco pFT wo-Degree "FieM (2dF) QSO Red- 
shift Survey (2QZ; [Croom et ar]|2004[ ), have historically 
performed quasar target selection by searching for rela- 
tively bright quasars (z < 19.1, z < 3 objects for SDSS). 
However, previous methods, such as the traditional "UV 
Excess" (UVX; selecting star-like objec ts with unusu- 



ally blue broa dband colors, Sandage|T965 ), "color-boxes" 
Croom et al. 2009) and Kernel Density Estimators" 



Richards et al.| ,2004) begin to fail at fainter magni- 
tudes because photometric errors broaden the stellar lo- 
cus, leading to potential incompleteness and inefficiency 
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in target selection. This motivated our development of a 
selection technique which better handles the photometric 
flux errors as one approaches the flux limit. 

Furthermore, at redshifts between z = 2.5 and z = 3.0, 
broad-band optical color selections fail, since the colors of 
these quasars are similai^to those of stars (in part icul ar 



Fan 



|1999| [Richards et al.||2002[ ) 

the stellar locus" when the 



early A and F stars 
as the quasars "pass over 
photometric colors are the same as the stellar colors (see 
Section 3.2 for more details). Simultaneously, quasars 
become much fainter e.g., an Mg = — 23 quasar at 2; = 2, 
has ^-band ~ 21.7, which is close to the SDSS single- 
epoch magnitude limit. 

The BOSS LyaF/Quasar Survey will target objects 
thought to be z > 2.2 quasars to perform a LyaF 
BAO measurement. Since the foreground LyaF is in- 
dependent of the intrinsic properties of the background 
quasar, there is freedom to use multiple selection meth- 
ods without biasing the BAO results. The methods 
used for BOSS ta r^etin^ include the " Kernel Density 
Estimator" (KDE; [Richards et~ar|l2004|), an "Extr eme- 
Deconvolution" method (XDO^ U;|Bovy et al.|2011| ), and 
a Neural Network method (NN; | \eche et al .||2010p . The 
BOSS QSO target selection used for the flrst year of ob- 
servations (Ross et al. 2011a) combines all these differ- 
ent methods, including the Likelihood method described 
in this pa per, with differen t photometr ic catalogs such 
as SDSS ( York et al.|[2QQQ[), UKIDS S ( [Lawrence et~ 
2007), GALEX ([Martin et al.| |2005| and qua sars found 
using their flux time-v ariability information ( Palanque- 
Delabrouille et al.[[2QTl ). 



in this paper we describe a new method for quasar 
target selection. Our method models data in 5-fllter flux 
space, then calculates likelihood estimates that a given 
object is a z > 2.2 quasar. Because a given survey has 
a flnite number of spectroscopic flbers (observing time 
allocation) to dedicate towards quasar targeting, this 
method attempts to prioritize selection by calculating 
a probability that a potential target is a quasar based 
on these likelihood calculations. Targets are ranked by 
likelihood probability. This method differs from KDE 
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in that it incorporates the photometric errors for each 
object into the hkehhood calculations; also KDE only 
imposes a single magnitude prior and color-distribution, 
whereas we model the QSO density as a function of mag- 
nitude and evolution of color distribution. 

The layout of this paper is as follows. Section pi) de- 
scribes the method used to calculate the likelihoods, and 
training catalogs that are generated and used for Likeli- 
hood target selection. In Section ([3| we give an overview 
of the BOSS Data and the performance of the Likeli- 
hood method using this data. In Section Q we discuss 
testing and optimization of the method, as well as fu- 
ture work and possible improvements. We use the terms 
"quasar" and "QSO" interchangeably to refer to quasi- 
stellar, type-I broad line objects. All Right Ascension 
(RA) and Declinations (Dec) discussed are J2000. 

Our Likelihood method was used for the uniformly se- 
lected sample (which we refer to hereafter as "CO RE") 
in the first y ear of the BOSS QSO target selection (Ross 
et al. 2011a) and it is our intention to release our cal- 
culated likelihood probabilities as a data product in the 
future Data Releases of the BOSS (the first such event 
is SDSS Data Release 9). 

2. METHOD AND CATALOG GENERATION 
2.1. Likelihood Method 

Recent work has approached target selection within 
a Bayesian statistical fra mework over more traditional 
color-box a pproaches (Ri chards et al. 2004 Bovy et al. 
2011 Mortlock et al.||20lf] ). Spectroscopic target selec- 
tion can be viewed as a classification problem. Given 
a set of photometric target objects (O) with attributes 
(a) and a discrete set of astronomical object classes, one 
would like to assign a target to a particular class. For 
our purposes we are simply interested in the question: "Is 
the object a quasar?" Thus we have two classes: quasar 
(QSO) and non-quasar (i.e., all other observable objects: 
stars + galaxies + anything else) hereby referred to as 
Everything Else {EE). 

The probability that an object O is a quasar (in class 
QSO) given a vect or of object attributes a, is provided 
by Bayes' theorem ( |Sivia fc Skilling||2006[ ): 



V{0 G QSO I a) = 



V{sL\Oe QSO) V{0 G QSO) 



(1) 



where ^^(a | O G QSO) is the conditional proba- 
bility that given attributes a, object O is a quasar; 
V{0 G QSO) is the prior probability that O is a quasar 
(prior in the sense that it does not take into account 
any information about the object attributes); 7^ (a) is 
the marginal probability of an object with attributes a 
occurring at all, and acts as a normalizing constant. In 
our case: 



'P(a) = p( a I O G QSO) V{0 G QSO) 
+ 7^(a I O G EE) V{0 G EE) 



(2) 



because Q SOU EE contain all possible classifications (or 
outcomes) for object O. 

We used the term "likelihood" to denote the con- 
ditional probabilities 7^(a | O G QSO) and 
V{a \ O e £;£;) inEqns. ([l]) and ([2|. In the case where 
the attributes of a target object are measured with a sig- 
nificant amount of uncertainty, one can imagine a is a 



noisy measurement of an underlying true attribute vector 
a^ We can then calculate the likelihood by marginalizing 
over all possible of values of a^ ^ 

V{ai I O G QSO) = J 7^(a,a | O G QSO) ds! 

= j V{si\ a ,0 G QSO) V{ai' \ O G QSO) ds! (3) 

= y P( a I aO P( a I O G QSO) da . 

For our purposes, 7^(a' | O G QSO) is just the em- 
pirical distribution observed in a discrete set of high 
signal-to-noise objects which are already classified to 
be either quasars or non-quasars. The attributes are 
the photometric fiuxes (/) in the five SDSS color filters 
(f = {ix, r, i, z}) and are independent of each other. 
Because the empirical distribution places a (5-function at 
each training example in Eqn. (|3|, the integral becomes a 
sum over all objects {O') with attributes a.' in the train- 
ing sets: 



I V{8i I a ) V{8i' I O G QSO) dai' = ^ 7^(a | a^ 



(4) 



Like other recent publications fBovy et al. 2011 



Mortlock et al. 2011), we use a Gaussian distribution, 
V[di I a'), for the uncertainties of the attributes. Thus 
a Gaussian distribution is used for the errors (a/) in the 

object fiuxes, and fiuxes / and for one of the target 
object attributes (a) and training object attributes (a') 
respectively. The likelihood (£) for a single fiux / then 
becomes: 



C = V{f\0 e QSO or EE) 



E 



1 



exp 



2a} 



(5) 



When we consider all five SDSS fiuxes, there is a multi- 
plicative sum over these attributes and Eqn. ([5| becomes: 



C = V{{u, g, r,i,z}\0 e QSO or EE) 



E n 

O' f—u,g,r,i,z 



exp 



2a) 



(6) 



Note that the above equations become equalities when 
the training catalogs completely represent the object 
fiux-space. For our target object fiuxes (/), we 
used SDSS photometric PSF fiuxes from the SkyServer 
(http://www.sdss3.org/dr8/) under the standard SDSS 
data releases. 

All fiuxes and their errors are corrected for Galactic 
extinction in the SD SS filters using the prescription in 
"Schlegel et al.| (19981). Because the sum is done in fiux 
5loi 



space rather than color space, object errors are indepen- 
dent. Also, our method preserves the luminosity function 
information, whereas the absolute flux information is lost 
when using colors. Our training catalogs use stacked (co- 
added) fluxes (see Sections 2.3 and 2.4), whereas the tar- 
gets are single epoch fluxes (see Section 3.2). Therefore 



^ In the derivation of Eqn. ([3|, we assume the noisy observation 
a is independent of the classification of O given a^, therefore: 
7^(a I a^ O G g^O) =7^(a I aO- 



Likelihood 
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in the above equations, the errors in the catalog fluxes 
(/^ ) are ignored because the signal-to-noise ratio of the 
catalog fluxes are much greater than the signal-to-noise 
ratio of our potential target fluxes (/). 

The QSO likelihood can separated into redshift bins 
(Az) so that we can tune Eqn. ([T]) to a desired target 
redshift range. This simply requires having redshift s for 
the objects {O') in the QSO training catalog and sub- 
dividing this data into redshift bins. We used a width 
Az = 0.1 (e.g. 0.5 ^ 0.6, 0.6 ^ 0.7 ... 4.9 ^ 5.0). This 
results in the following final equations for the QSO and 
EE likelihoods: 



CQso{Az) = V{f\OeQSO{Az)) 



- E n 

0'eQSO{Az) f=f 



CEE = V{f\Oe EE) 



E n 

O'eEE f=f 



exp 



exp 



2aj 



2a) 



(7) 



(8) 



The Gaussian normalizations add a multiplicative con- 
stant to each likelihood (£), which is the same for both 
Eq. ^ and Eq. ([8| for a given target and cancel when 
calculating the probabilities in Eqn. ([T]). 

The prior probabilities in Eqn. (IT]) and Eqn. ^ are the 
relative surface densities of quasars and everything else 
on the sky, and thus normalize Eqn. ^ and Eqn. (Sl). 
We do this by defining the prior probaoilities to be the 
inverse of the effective sky area {A) of the QSO and EE 
catalogs: 



V{0 e QSO) 



Aqso 



V{0 e EE) 



(9) 



By inserting Eqn. ([8|, Eqn. ([7|), and Eqn. (|9| into 
Eqn. ([T]) we can get probability that a single potential 
target object (O) is a quasar {QSO) in a target redshift 
range (Aztarget): 



V{0 G Q50(A^target) | f ) : 



E 

^^targel 



V Aqso 



+ E 



\ Aqso 



(10) 

In the numerator, Cqso{Az) is summed over the desired 
quasar target redshift range (A2;target)7 whereas the de- 
nominator contains all objects in both catalogs summed 
over the entire redshift range (Azaii). This probability 
is exact in the limit of perfect training catalogs (infinite 
objects and zero errors). A probability is calculated for 
every potential target using the full QSO and EE cata- 
logs. 



2.2. Imaging Data 

BOSS uses the same imaging data as that of the orig- 
inal SDSS-I/II survey (Yo rk et al. 2000) , with an exten- 
sion in the South Galactic Cap (!S(jC). These data were 
gathered using a dedicated 2.5 m wide- field telescope 
( Gunn et al.||2Q 06 | to coll ect light for a camera with 30 
2kx2k ( JGDs ([Gunn e t al. 1998) over five broad bands 
- ugriz ( Fukugita et al.,,1996J ; this camera has imaged 



14,555 deg^ of the sky, including 7,500 deg^ in the North 
Galactic Cap ( NGC) and 3,100 deg^ in the SGC ( [aT 
hara et al. 2011). The imaging data were taken on dar k 



photometric nights of good seeing (|Hogg et al.| |2001 
and o b jects were detected and their properties were mea- 
sured ( Lupton et al. 2001 ^toughton et al. 2002) a nd cal 
ibrated photometrically 
Tucker et al. 12006!'' 



jnith et al. 2002; Ivezicetal. 
i^dma nabhan et al.,20Q8| ), and 



2004 

a strometrically ^PTe r et al. 2003fT 

Padmanabh an et al.| (,2008) present an algorithm which 
uses overlaps between SDSS imaging scans to photomet- 
rically calibrate the SDSS imaging data. BOSS target se- 
lection uses these "ubercalibrated" data from the SDSS 



Data Release Eight (DR8) database ( [Aihara et alT201l[ ). 
The 2.5° stripe along the celestial equator in the ISouth- 
ern Galactic Cap, commonly referred to as "Stripe 82" 
was imaged multipl e times, for up to 8 epochs spanning 



a 10-year baseline ('Abazajian et aL 
tion of these data 



2009|). A co addi- 
Adelman-McCarthy et al.||2UU8| goes 



roughly two magnitudes fainter than the single-scan im- 
ages which make up the bulk of the SDSS imaging data. 

2.3. QSO Catalog 

Because there are relatively few previously observed 
quasars in the desired BOSS redshift range {z > 2.2) 
with sufficiently small flux errors to precisely describe 
the quasar color locus, for our purposes the QS O Cata- 



log is gener ated by a Monte Carlo technique (jHennawi 
et al. 112010) to provide a less biased and more complete 



sample than is available from the SDSS quasar catalog. 
The Monte Carlo simulation uses a model o f the quasar 
luminosity function based on the studies by [Jiang et al.| 
(2006) to compute the density of quas ars as a function 
of redshift and z— magnitude. The Jiang et aT] ([2006 ) lu- 
minosity function is used because it extends fainter than 
the luminosity function of Richards et al. (2006| and thus 
better matches the high redshift quasars in the BOSS 
redshift regime. SDSS Data Release 5 spectroscopic ally 



confirmed quasars (DR5QS0; [Schneider etliI]|2QQ7| ) are 
the photometric inputs to the Monte Carlo. The simu- 
lation generates 9.94 million unique (z-magnitude, red- 
shift) pairs down to i = 22.5 (0.5— mag fainter than 
the BOSS magnitude limit) with a distribution given by 
the luminosity function. Each simulated quasar (O^) is 
then matched to the SDSS quasar (DR5QS0) with the 
nearest redshift to 0^ The SDSS photometry of the 
DR5QS0 quasar is rescaled such that its z— magnitude 
matches that of the simulated quasar {O'). We assume 
that quasar colors are not a function of magnitude in the 
redshift range of interest, and thus can be extrapolated 
in this manner to deeper fluxes. Thus this technique pre- 
serves the relative fluxes while providing a more complete 
coverage of the flux space than only using known SDSS 
quasars. Objects with redshifts in the range desired for 
BOSS targ eting {z > 2.2) are included in the numerator 
of Eq. (10). The location in ugr color-color space of the 
z > 2.2 objects in the QSO Catalog is shown by the blue 
contours in Fig. ([TJ. 

2.4. Everything Else Catalog 

The Everything Else (EE) Catalog is generated us- 
ing stacked SDSS "Stripe 82" imaging, allowing the con- 
struction of a large point source catalog with variability 
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Fig. 1. — Contour plot of the u — g and g — r colors of the 
Everything Else (red) and QSO (blue) Catalogs. The region of 
overlap, where target selection becomes difficult, is at u — g ^ 1 
and g — r 0. The error bars are the SDSS single-epoch g — r and 
u — g magnitude errors at g=22 (black) and g=20 (grey). 

information and smaller errors than possible using single- 
epoch SDSS imaging. Stripe 82 is the 2.5° wide region on 
the celestial equator between RA= —45° and RA= 45° 
where SDSS repeatedly scanned. Non-photometric data 
were ignored, and the photometric images were pro- 
cessed with a version of the SDSS photomet ric reduction 



pipel ine similar to that in data release eight ( Aihara et al. 
2011). The photometric depth is r ~ 22.5 magnitude 
(5cr) for point sources, with high completeness and accu- 
rate star-galaxy separation to r ~ 22 magnitude. These 
data were combined at the catalog level to produce co- 
added PSF photometry. Typically 20 observations were 
included for each object, resulting in a co- added catalog 
with typical errors of 6.1%, 2.4%, 3.0%, 7.1% and 27% 
at 22nd magnitude in the u^g^r^i^z filters, respectively. 

The EE Catalog is further trimmed to a clean sample 
of non-variable point sources for inclusion in the likeli- 
hood calculations of Eq. (10). The 23.9% of objects that 
are blended with neighbormg objects are rejected, thus 
reducing the effective footprint of this catalog from 225 
deg^ to 171.2 deg^. Objects with high variability are 
explicitly excluded from the catalog under the presump- 
tion t hat these are dominated by quasars ( Sch midt et alT] 
2010j), and we explicitly add quasars into the numera- 
tor and denominator of Eq. ( |10[ ) such that the computed 
probability remains in the range [0,1]. These variable 
objects are identified as those with a reduced of the 
fit to a constant r-band flux exceeding 1.4. This reduces 
the catalog to an effective area of 150 deg^. The result is 
a catalog with 1,042,262 photometric fluxes that repre- 
sent all non-quasar types of objects. We determined the 
contamination of z > 2.2 quasars in this set is less than 
0.5% by comparing this catalog with those for which we 
have spectra. In Fig. ([T]) the red contours show the urg 
color-color space of the objects in the EE catalog. 

3. BOSS DATA & LIKELIHOOD PERFORMANCE 

3.1. BOSS Stripe 82 Data 

In September of 2009, BOSS started taking spectro- 
scopic data. During the first year of data taking, several 
target selection methods were employed. In addition to 



likelihood method, three other selection techniques were 
deployed: the KDE method developed to classify qua sars 
by separating them from stars in color space (Richards 
et a l. 2004), an "extreme-deconvolution" method ( |Bovy 
et al.| (^2011), Section 4.4) a nd a new app roach based on 
artificial neural networks (Yeche et al.|2009) . Previously 
spectroscopically confirmed quasar s, as well as objects 



with high variability ( jPalanque-Delabrouille et al. [2011 ) 
over consecutive Stripe 82 runs were also targeted during 
this time. 

Stripe 82 target selection used co-added catalog data 
from SDSS as the potential target fluxes. Because the co- 
added photometry has a higher signal-to-noise ratio than 
any single-epoch data run and the target fiber density 
in this region was higher than the rest of the survey, 
BOSS QSO completeness is highest in this region. Once 
observed, all of the quasar targets were automatically 
classified and then visually examined. 

Based on the objects selected in Stripe 82, we found 
that the performances of the four methods were not iden- 
tical as a function of the m agnitude and redshift of the 
objects (Ross et al. "2011a). This behavior is likely due 
to the ditterent strategies adopted in the training of the 
methods. 

3.2. Likelihood Performance 

Although we targeted a number of tiles for spec- 
troscopy during the first year of data taking, observa- 
tional success was varied. Due to a combination of poor 
observing conditions and equipment glitches, spectro- 
scopic completeness (the fraction of total spectroscopic 
observations in a tiling region which yielded a high con- 
fidence spectroscopic identification upon visual inspec- 
tion) was a strong function of the region in which a tar- 
get was tiled. In this paper, we only test our method 
using observations in Stripe 82 regions with a spectro- 
scopic completeness of > 90%. In Fig. ([2| we show the 
tiles used for testing. 

To test the performance of our likelihood method, we 
calculated probabilities using Eq. \m on single-epoch 
data in regions of Stripe 82 with hignspectroscopic com- 
pleteness and compared that target list with the BOSS 
"truth table" (which includes targets from all target- 




FiG. 2. — Right Ascension (RA) versus Declination (Dec) of BOSS 
QSO data used for the hkelihood method testing and luminosity 
function testing. Testing was done in the Stripe 82 calibration 
band with regions of high (> 90%) spectroscopic completeness. 
The blue points are spectroscopically confirmed quasars and the 
yellow regions are the sky tiles that were observed. Note that the 
vertical and horizontal scales are not the same. 




Fig. 3. — Color-color diagrams of BOSS QSOs recovered by the likelihood method (magenta)^ false-negative QSOs that were not targeted 
(missed) by likelihood method (cyan), and false-positive stars that were wrongly targeted by likelihood method (red). These plots show 
recovered/missed (z > 2.2) QSOs. It is clear when comparing these plots with Fig. Jl]) that the problematic region for likelihood targeting 
is where the two catalogs overlap near u — g = l, g — r = 0.25. For context the QSu Catalog and EE Catalog contours plot from Fig. (m 
are included in the above plots. The error bars are the SDSS single-epoch g — r and u — g magnitude errors at g = 22 (black) and g = zu 
(grey). The targeting decisions were computed in flux space rather than the color space shown in the flgures. 



ing methods, quasars targeted using variability, and all 
previously known quasars). This is a fair test because 
targeting in this region was conducted using co-added 
photometry and thus we are not testing the likelihood 
method on a region that was targeted with the same 
photometry. 

Likelihood probabilities were calculated for 592,847 ob- 
jects; of those, the top 7,757 likelihoods were selected 
{V > 0.245) for a target density of 40 objects per deg^. 
Fig. Q shows the distribution of the probabilities for 
the recovered^ and false- negative (missed) QSOs as well 
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Fig. 4. — The probabihty (V) distributions of the hkehhood 
method recovered QSOs (magenta, 4617 total), false- negative 
QSOs that were missed by the likelihood method (cyan, 1566 total), 
and false-positive stars that were incorrectly targeted by likelihood 
method (red, 5743 total). The vertical gray dashed line shows the 
likelihood V threshold used for targeting (V > 0.245). The spike 
around 7^ = in the cyan curve are quasars that fall in the midst 
of the stellar locus and therefore are found by the method to have 
a very low probability of being QSOs. Most of these quasars are 
targeted because they are previously spectroscopically confirmed 
QSOs or by their flux variability. The likelihood distribution of 
the probabilities for the untargeted stars (true-negative) are not 
included in the plot, but constitute an additional 742,662 objects. 

^ We deflne recovered/missed QSOs to be quasars in the desired 
BOSS redshift range (z > 2.2). 
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Fig. 5. — The redshift distributions of the likelihood method 
recovered QSOs (magenta) and false- negative QSOs that were not 
targeted (missed) by the likelihood method (cyan), compared with 
SDSS DR5 QSOs (blue). 



as for the false-positive stars (wrongly) targeted by the 
method. We found an efficiency {E = Recovered QSOs / 
Total Targets) of 40% and completeness (C = likelihood 
recovered QSOs / total BOSS recovered QSOs) of 65%. 
Fig. ([3| shows ugr color-color plots of BOSS quasars re- 
covered (magenta) and missed (cyan) by the likelihood 
method as well as false positive contamination stars (red) 
that were targeted by the method. 

There is of course the inevitable trade-off between E 
and C. The more fibers given to quasar targets, the more 
QSOs are found (greater C), but the accuracy of targeting 
a quasar decreases (lower E). This is shown in Table 
where the rate of new targeted QSOs is shown to steadily 
decrease as a function of targets deg~^. 

By comparing the target objects to the catalog con- 
tours in Fig. p), it is clear that the likelihood method 
fails mostly in the region of overlap between the two cat- 
alogs. Fig. ([5| shows the redshift distributions of the 
targeted and missed quasars and the limitation of SDSS 
DR5 catalog at z > 2. Table ([T]) shows the detailed test- 
ing results. 
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TABLE 1 

Likelihood Stripe 82 Results, | Jiang et al.| ( |2QQ6[ ) Luminosity Function 



Targets 


Likelihood V 


Total 


QSOs 


QSOs 


C 


E 


per deg^ 


Threshold 


Targets 


Recovered 


Missed 


(%) 


(%) 


5 


0.974 


969 


669 


3811 


15 


69 


10 


0.833 


1938 


1276 


3227 


28 


66 


20 


0.535 


3878 


2166 


2401 


47 


56 


40 


0.245 


7757 


3087 


1657 


65 


40 


60 


0.136 


11636 


3595 


1331 


73 


31 


80 


0.088 


15515 


3965 


1108 


78 


26 


100 


0.063 


19394 


4219 


980 


81 


22 


140 


0.037 


27152 


4618 


806 


85 


17 



Note. — The S and C as a function of dedicated target fibers (targets deg"-^). These values 
are for z > 2.2 recovered/missed QSOs. There is a trade-off: the more fibers given to targets, 
the more QSOs are found (greater C), but the accuracy of finding a quasar decreases (lower 
S). Th e values for threshold, C and S will of course depend on Galactic latitude ( Ross et ah] 

[201 la|) BOSS year-one data targeted using the likelihood method at 20 targets deg for the 

TXJFTE sample. 

4. TESTING, IMPROVEMENTS AND CONCLUSIONS 

4.1. Likelihood versus Color-Box 

In order to see how our likelihood method performed 
against the traditional "color-box" selection, we com- 
pared the number of z > 2.2 quasars the likelihood 
method was able to recover versus a simple color-box 
selection, using the B OSS data on Stripe 82 (the same 
data set as in Section 3.2). We note that our color-box, 
described below, is a relatively simple selection in only 
{u — g) vs. {g — r) magnitudes color-space, designed to 
adequately sample the location where z ^ 2.1 QSOs re- 
side. This color-box is not the same as the "inclusion 



region from iiRichards et al.| ( |2QQ 2) (Section 3.5.2, Fig. 
7) but the {g — r)< U.43 • [u — g) cut is inspired by their 
high-redshift color-selection. 
The color cuts we used for our tests are: 




< -0.13-{u-g) - A) and 
g) > 0.3) and (r < 22.0)] 
or 

[{{g - r) < 0A3-{u -g) + B) and 
(0.3 < (i^ - ^) < 2.0) and (r < 22.0)] 



(11) 



1 2 
u-g magnitude 

Fig. 6. — A color-color diagram showing the cuts applied for 
color-box selection. The red points are potential targets that were 
not selected by the color-box method. Th e other colored regions are 
targets selected by the cuts in Eqn. ( |11| ) with the different offsets 
from Eqn. ( |12| . The color-box selection cuts with the most targets 
is at the top (closest to the stellar locus, shown as grey contours), 
and the cuts get more restrictive moving towards the x-axis. 



where we offsets A and B in Eqn. (11) are defined as: 
A = (0.01 • k) - 0.32, B = -(0.01 • k) - 0.28 (12) 

varying from to 29, k = 



and k is an integer 
[0,1, 2,-- - ,27^28,29]. 

In Figure (p| we show the above color cuts applied to 
the Stripe-82 potential quasar targets. The red points are 
targets that are not selected by the color-box. The points 
that are other colors at the bottom center of the fi gure 
are targets that were selected by the cuts in Eqn. (|11[ ) 
with the different offsets from Eqn. (12). The color- box 
selection cuts with the most targets (X= 0) are closest 
to the stellar locus (shown as grey contours). The cuts 
then get more restrictive going down towards the x-axis, 
where less and less targets are selected. 

Figure ^ show the results of z > 2.2 quasars recovered 
by the above color cuts compared to those recovered by 
the likelihood method. Thi s is c onsistent with the results 
in Table 8 of |Ross et al.| ( |2011b[ ) which shows 6.45 mid-z 
quasars are recovered from 20 targets deg~^ using their 
(slightly different) color-box selection. Likelihood out 
performs the color-box selection method by recovering 
over twice as many BOSS quasars at 20 targets deg~^. 




Likelihood 
Color Box 



40 

Targets / deg^ 



80 



Fig. 7. — The number of BOSS z > 2.2 quasars recovered as 
a function of targets deg~^ for the hkehhood method (red) and 
a traditional color-box technique (blue). Likelihood out performs 
the color-box selection method by recovering over twice as many 
BOSS quasars at 20 targets deg~^. 




Fig. 8.— {Left) - Redshift distribution for [Jiang et al.| \2006\ (red), [Richards et al.| (12006) (blue) and 'Hopk ins et al.[ (|2007 ) (green) 
luminosity functions. (Center) - The number of BOSS z > 2.2 quasars recovered as a function of targets deg"-^ for the three luminosity 
functions (i.e. different priors). The performance of all three LFs is almost identical for target densities up to ~ 20 targets deg~^, at which 
point the Richards model starts to perform sightly worse. The gray line shows 100% efficiency, and emphasizes that very high efficiency 
is achieved if a small number of targets are selected. (Right) - The redshift distributions of the recovered z > 2.2 quasars for the three 
luminosity functions. 

TABLE 2 

Likelihood Luminosity Function Testing 



Luminosity 


Targets 


Likelihood V 


Total 


QSOs 


QSOs 


C 


S 


Function 


per deg^ 


Threshold 


Targets 


Recovered 


Missed 


(%) 


(%) 


Jiang et al. 


20 


0.535 


3878 


2166 


2401 


47 


56 




40 


0.245 


7757 


3087 


1657 


65 


40 


Richards et al. 


20 


0.237 


3978 


2089 


2466 


46 


54 




40 


0.079 


7757 


2939 


1808 


62 


37 


Hopkins et al. 


20 


0.383 


3878 


2154 


2401 


47 


56 




40 


0.158 


7737 


3046 


1676 


64 


39 



Note. — Shows the completeness (C) and efficiency (S) as a functio n of dedicated targ e t fibers (targets deg~^) 
for three different lumino sitv functions. The three LFs tested are from [Jiang et alT] ( |2Q06| , [Richards et al.] ( |2006) ), 
and [Hopkins et aL] ( |2007( >. These values are for z > 2.2 recovered/missed cj^Us. 

4.2. Luminosity Function Testing 

We tested the performance of three different quasar 
luminosity functions (LFs) as inputs to the QSO Cat- 
alog. The LFs enter into the generation of the QSO 
Catalog by determining the density of quasars as a func- 
tion of redshift and z— band magnitude. All the other 
details of t he M onte Carlo remain the same as described 
in Section ( |2.3| ). The EE Catalog is not dependent on 
these LFs so this catalog stays the s ame for thes e tests. 
The three functions tested are from [ Jiang et"aL (20061), 
Richards et"aL] {2006), and |Hopkins et al.| ( |2007f ~Tlie 
inputs and results from this testing is shown in l^ ig. Q. 

The quasar redshift distributions for these three lumi- 
nosity functions are shown in the left panel of Fig. ([8|. 
The performance of the method did not change signm- 
cantly for the three different LF priors. Fig. ([8| Center) 
shows the number of quasars successfully recovered as a 
function of the number of dedicated QSO target fibers 
per deg^. Notice the shape of this function, the rate of 
newly recovered quasars drops off significantly beyond 
40 targets deg~^. The performance of all three LFs is 
essentially identical up to 20 targets deg~^. The red- 
shift distributions of the recovered quasars are slightly 
different as shown in Fig. ([sj Right), so using different 
LF could be used to help tune the targeting redshifts. 
This method applied to BOSS targeting is not sensitive 
to uncertainties in the luminosit y function. 



Richards et al.| ( |2006D and [Hopkins et aT] ( |2007D . More 
detailed values for the performance ot these ditterent lu- 
minosity functions can be seen in Table 

While we are not currently making a proper compar- 
ison of the redshift distributions of BOSS quasars and 
the redshift distributions of these luminosity functions, a 
future improvement would be to use a luminosity func- 
tion generated from the redshift distribution of BOSS 
quasars, properly adjusted for the targeting selection 
function imprinted upon it, as the input to the Monte 
Carlo to see if this approach improves target selection. 
Another promising modification would be to add the pho- 
tometry from BOSS quasars to the inputs to the Monte 
Carlo simulation. 

4.3. Weighted Likelihoods 

We also tested adjusting Eq. ^ to incorporate a 
weighting factor to optimize (in redshift-magnitude 
space) sel ection of objects with a high dark energy figure 
of merit (Albrecht et al. 2006[ ). This weighting is done 
by simply adding a factor [wq' ) inside the product based 
on the value of the QSO catalog quasar flux (/^ ) and 
redshift (z): 



Cqso{Az)= J2 n f 
o'eQSO(Az) /=f y ■ 



■ exp 



(/■ 



2a} 



(13) 



Ultimately it was decided that Jiang et al. ( 2006| was 
the best luminosity function for our purposes because 
it was more efficient at recovering z > 2.2 QSOs than 



We tested adjusting the likelihood method in this man- 
ner with weights {wq') calculated by Pat McDonald (pri- 
vate communication, see Fig. 10, and Table [3|. Here the 
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Recovered Quasars Redshift Distribution of Recovered QSOs Weight Distribution of Recovered QSOs 




Fig. 9. — (Left) - The number of BOSS z > 2.2 quasars recovered as a function of targets deg"-^ for the hkehhood method with and 
without weights. The gray hne shows 100% efficiency, and emphasizes that very high efficiency is achieved if a small number of targets are 
selected. (Center) - The redshift distributions of the recovered z > 2.2 quasars. (Right) - The weight distributions of the recovered z > 2.2 
quasars. Notice that using the likelihood + weights recovers QSOs with a higher BAO value. 

TABLE 3 
McDonald Weights 



r-mag 



2.0 2.25 2.5 2.75 3.0 3.25 3.5 3.75 4.0 4.25 



4.5 



17.5 
18.1 
18.7 
19.3 
19.9 
20.5 
21.1 
21.7 
22.3 
22.9 
23.5 



0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.00 
0.00 



0.487 
0.476 
0.464 
0.446 
0.411 
0.350 
0.288 
0.198 
0.113 
0.041 
0.008 



0.822 
0.818 
0.814 
0.807 
0.794 
0.758 
0.698 
0.565 
0.381 
0.187 
0.045 



0.713 
0.712 
0.710 
0.708 
0.704 
0.692 
0.665 
0.597 
0.446 
0.254 
0.054 



0.570 
0.569 
0.568 
0.567 
0.564 
0.557 
0.541 
0.499 
0.388 
0.247 
0.049 



0.441 
0.441 
0.440 
0.439 
0.438 
0.434 
0.424 
0.401 
0.331 
0.231 
0.046 



0.334 
0.334 
0.333 
0.333 
0.332 
0.330 
0.323 
0.307 
0.269 
0.173 
0.026 



0.250 
0.250 
0.250 
0.250 
0.249 
0.247 
0.242 
0.227 
0.196 
0.109 
0.015 



0.162 
0.161 
0.161 
0.161 
0.161 
0.159 
0.155 
0.144 
0.118 
0.061 
0.004 



0.162 
0.161 
0.161 
0.161 
0.161 
0.159 
0.155 
0.144 
0.118 
0.061 
0.004 



0.162 
0.161 
0.161 
0.161 
0.161 
0.159 
0.155 
0.144 
0.118 
0.061 
0.004 



Note. — Shown is a subsample of the values for the weights in Eqn. ( |13| . For a full length, 
downloadable table of weights, please see the electronic version of this paper. 



weight is determined by the contribution of the quasar's 
LyaF to the BAO signal; a higher weight yields a higher 
signal-to-noise BAO measurement. 

The weight is a functional derivative of the overall BAO 
distance error squared with respect to the luminosity 
function. It can therefore be integrated over any achieved 
luminosity function (or summed over a set of quasars) to 
produce an estimate proportional to the BAO distance 
error squared that one would expect to achieve from that 
data set. There are two relevant factors affecting the 
value of a quasar: the noise level in the spectrum, and 
the density of quasars at a given redshift. The low red- 
shift cutoflF comes primarily from the degradation in the 
signal-to-noise ratio at the blue end of the spectrograph, 
while the high-z tail-off comes from the diminishing den- 
sity of quasars with which to perform a cross-correlation 
for LyaF calculations. 

While using these weights did recover QSOs with a 
higher on average BAO signal, as expected fewer total 
quasars were recovered using this scheme, see Fig. ([9|. 
Ultimately it was decided to optimize the number of re- 
covered quasars rather than the BAO signal for BOSS 
target selection. Therefore this weighting scheme was not 
used in the final likelihood targeting algorithm. However, 
depending on the goals of the user, a weighting scheme 
could be useful for future targeting purposes. 

4.4. Likelihood and XDQSO 

The likelihood method inspired a similar target- 
ing approach, extreme-deconvolution quasar targeting 



(XDQSO; |Bovy et al.||2011|). The training sets used 
in XDQSO are almost identical to the QSO and EE 
Catalogs used in the likelihood method. XDQSO uses 
an extreme-deconvolution fit to these catalogs, such that 
they are represented by a small set of Gaussian distribu- 
tions instead of large set of discrete objects. Likelihood 
calculates probabilities as a straightforward sum over all 
objects, whereas XDQSO does a more algorithmically 




Fig. 10. — The McDonald Weight (or effectiveness as a dark 
energy BAO probe) of a QSO as a function of magnitude and 
redshift. The lines are at 0.1 magnitude intervals. Brighter quasars 
have a higher weight, and so do quasars centered around a z ~ 2.5. 
For a detailed table of the numbers in this plot, see Table 



Likelihood 
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comphcated fit. However, once the catalog Gaussians are 
determined, XDQSO probabilities are much faster to cal- 
culate than those in the likelihood method. In the limit 
where the QSO and EE Catalogs are extremely large, 
and their photometric errors are very small, the two al- 
gorithms are essentially equivalent. However, for small, 
noisy training sets, likelihood represents the continuous 
color-distribution of quasars as discrete delta functions in 
color-space which can produce noisier results. A study of 
the performance of the two methods shows that they are 
comparable in BOSS targeting efficiency, with XDQSO 
finding ~ 1 additional QSO deg~^ at a given threshold 
(|Bovy et al. 2011). The similarity in targeting success is 
because the training sets used in likelihood are large and 
use high signal-to-noise data. 

4.5. Summary and Conclusions 
In this paper we: 

• Developed a new method for quasar target selec- 
tion using photometric fluxes and a Bayesian prob- 
abilistic approach; 

• Demonstrated that this leads to the recovery of 
15.9 {z > 2.2) quasars deg-^ from the SDSS Stripe 
82 dataset when targeting at 40 targets deg~^, with 
a completeness of 65% and efficiency of 40%; 

• Showed that the likelihood method recovers twice 
as many quasars as traditional "color-box" selec- 
tion; 

• Tested for the effects of changing the input QSO 
catalog, using different luminosity functions and 
adding a weighting scheme to the likelihood cal- 
culations. 

The likelihood method can easily be extended to in- 
clude other attributes (a) in addition to the photomet- 
ric fluxes in the exponentials in Eq. ([7| and Eq. ([sj). 
In addition, variability information, which has already 
been demonstrated as useful in quasar targe t selection 



n 



Schmidt et al. 2010. iMacLeod et al. 



targe t selection 
2011| [Palanque^ 



elabrouille et al. 2011 ), could be incorporated. SimT 



larly, extending this method to include more color filters 



could help with target selection, as shown by I Richards 
et al.| ( |20Q9D . 



Alter a commissioning period in September-November 
2009, the QSO targeting fibers were dividing into a uni- 
formly selected CORE sample and a non- uniformly se- 
lected BONUS sample ( |Ross et al.||2011aD . The likeli- 
hood method, using the Jiang luminosity function, was 
used for targeting the CORE sample (20 targets deg~^) 
for the first year of BOSS data taking. The rest of the 
fibers (BONUS sample) were targeted by a combination 
of the output of the likelihood, KDE and NN methods 
using a neural network. This approach allows us to com- 
bine both different methods and different photometric 
catalogs (SDSS, UKIDSS, GALEX) in the BONUS se- 
lection. 

After the first year of data taking, the CORE s ample 



targeting s witched to using the XDQSO method (Bovy 



et al. 



2011 ) and likelihood was then used in the BON US 



recovering high-redshift quasars and the priority of the 
target selection team was to maximize number of se- 
lect edquasars. We plan to release the probabilities from 
Eq. (10) in the project data releases of SDSS data. 
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sample as well as one of the inputs to the NN. This switch 
was made because XDQSO performed slightly better at 
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